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^ Abstract 

In this paper we address the problem of fast and fair transmission of flows 
I in a router, which is a fundamental issue in networks like the Internet. We 

model the interaction between a TCP source and a bottleneck queue with 
qq the objective of designing optimal packet admission controls in the bot- 

£f} tleneck queue. We focus on the relaxed version of the problem obtained 

\Q by relaxing the fixed buffer capacity constraint that must be satisfied at 

C*~) all time epoch. The relaxation allows us to reduce the multi-flow prob- 

lem into a family of single-flow problems, for which we can analyze both 
f«) theoretically and numerically the existence of optimal control policies of 

fN) special structure. In particular, we show that for a variety of parame- 

i-H ters, TCP flows can be optimally controlled in routers by so-called index 

JL" policies, but not always by threshold policies. We have also implemented 

. £^ index policies in Network Simulator-3 and tested in a simple topology 

their applicability in real networks. The simulation results show that the 
S_i index policy covers a big range of desirable properties with respect to fair- 

ness between different versions of TCP models, across users with different 
round-trip-time and minimum buffer required to achieve full utility of the 
queue. 

Keywords: queue management; Markov decision process; TCP modeling; 
index policies; 



1 Introduction 

This paper deals with congestion control and buffer management, two of the 
most classical research problems in networking. The objective of congestion 
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control is to control the traffic injected in the network in order to avoid conges- 
tive collapse. Most traffic in the Internet is governed by TCP/IP (Transmission 
Control Protocol and Internet Protocol) ((SHE]). TCP protocol tries to adjust 
the sending rate of a source to match the available bandwidth along the path. 
In the absence of congestion signals from the network TCP increases congestion 
window gradually in time, and upon the reception of a congestion signal TCP 
reduces the congestion window, typically by a multiplicative factor. Buffer man- 
agement determines how congestion signals are generated. Congestion signals 
can be either packet losses or ECN (Explicit Congestion Notifications) ([25 ). 
At the present state of the Internet, nearly all congestion signals are generated 
by packet losses. Packets can be dropped either when the router buffer is full 
or when AQM (Active Queue Management) scheme is employed ([!]). 

In this paper we develop a rigorous mathematical framework to model the 
interaction between a TCP source and a bottleneck queue with the objective of 
designing optimal packet admission controls in the bottleneck queue. The TCP 
sources follow the general family of Additive Increase Multiplicative Decrease 
pattern that TCP versions like New Reno or SACK follow (0), but to keep the 
markovian model simple we ignore the slow-start phase. A TCP source is thus 
characterized by the decrease factor 7, which determines the decrease factor of 
the congestion window in the event of a packet loss (in TCP New Reno 7 takes 
the value 1/2). The objective is to design a packet admission control strategy 
to use the resources efficiently and provide satisfactory user experience. Mathe- 
matically we formulate the problem using the Markov Decision Process (MDP) 
([24]) as a resource allocation problem, which extends the restless bandit model 
introduced in |31j . The router knows or can infer the congestion window of each 
flow and aims at maximizing the total aggregated utility. As utility function 
we adopt the parameterized family of generalized a-fair utility functions, which 
depending on the value of a permit to recover a wide variety of utilities such as 
max-min, maximum throughput and proportional fair ( 22, 3 ). The fixed band- 
width capacity constraint that must be satisfied at all time epochs makes that 
the problem can be solved analytically only in simplistic scenarios. However, 
the problem becomes tractable if the fixed capacity constraint is relaxed so that 
the bandwidth allocation must be satisfied only on average. This relaxation 
allows to see congestion control at the router as a family of per-flow admission 
control problems, thus reducing the complexity of a multi-flow problem into a 
family of single-flow problems. In our main contribution, for the single-flow 
problem we analyze both theoretically and numerically the existence of optimal 
control policies of special structure. In particular, we show that for a variety of 
parameters, TCP flows can be optimally controlled in routers by so-called index 
policies, but not always by threshold policies. 

This solution approach based on the relaxation has gained notorious success 
in recent years and has been for instance used in website morphing, knapsack 
problems, job- scheduling problem in wireless, or military applications. We refer 
to 10J for a recent account on the methodology and applications. The inter- 
est in the approach is justified by mathematical results that show that, under 
some additional assumptions, the heuristic is asymptotically optimal (|29j). In 
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practice it has been reported on various occasions that the heuristic provides a 
close-to-optimal solution to the problem studied [TUl Chapter 6] . 

The index policy requires that the TCP sources are assigned an index that 
depends on their current congestion window given the TCP variant implemented 
at the source. When a packet arrives and the buffer is full, the buffer will drop 
the packet with smallest index. In the event that more than one packet share 
the same smallest index, the packet that has been the longest in the queue is 
dropped. 

We would like to note that our scheme requires as much or less information 
exchanges than the related congestion control protocols XCP ([H]), ACP ([7]) 
and RCP ([IS]). Moreover, these congestion control protocols aim at achiev- 
ing max-min fairness, whereas our scheme allows to achieve a more general 
a-fairness. Max-min is a particular case of a-fairness when the parameter a 
goes to infinity. Other important TCP variants are FAST TCP (30 J, TCP 
CUBIC (dJ) and TCP Compound ([27]), which are implemented in Linux and 
Windows, respectively. The main originality of our approach with respect to pre- 
viously existing variants is that our index protocol aims at achieving efficiency 
and fairness on shorter-term time scales, by maximizing the mean fairness of 
the actual rate instead of the fairness of the mean rates. 

We summarize the main contributions of our solution approach: 

• We address previously used notions of fairness by using a generalized a- 
fairness framework. 

• We achieve inter protocol fairness between different versions of AIMD 
TCP. 

• we achieve fairness with respect to the round-trip-time on the scale of 
congestion periods. 

We have implemented our solution in NS-3 (lj) and performed extensive 
simulations in a benchmark topology to explore and validate the properties 
of the algorithm, and to assess the improvement with respect a DropTail and 
RED buffers. In the simulations we focus on the case a = 1, since it was 
shown that the current Internet (with DropTail) maximizes the aggregate sum 
of logarithmic utilities of the time-average transmission rates (|18j). 

The simulation results show that the algorithm has several desirable prop- 
erties with respect to fairness and efficiency: 

• We improve fairness across users with respect to DropTail and RED, with 
respect to different TCP models and with respect to users with different 
round-trip-time. 

• In a queue implementing index policies, a smaller buffer size is needed to 
get the same throughput. 

We believe that our approach opens a new avenue of research to design in 
a combined manner congestion control and queue management policies. This 
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paper represents a first attempt, and more research will be required in order to 
assess several aspects that we did not consider, for instance, the performance 
of the algorithm in the presence of short-lived TCP connections, or the per- 
formance of the algorithm when implemented in many routers in the network. 
We also believe that we provide fundamental framework and ideas useful or 
directly applicable also in next-generation Internet architectures such as the 
Information-centric networking (ICN) and wireless networks. 

The rest of the paper is organized as follows. In [2] we put our work in the 
context of the existing literature. In [3] we describe the model and in [4] we state 
the problem. Section V shows the benefits of the relaxation of the described 
problem. In [6] we analyze the index policies for single-flow problems. We also 
establish several properties of index policies to be taken into account. With this 
results in the hand, we present in [7] the conclusion of the simulation done with 
NS-3. Finally, some conclusions are drawn and possible extensions are discussed 
inH 

A three pages extended abstract of this paper appears in [3] . 

2 Related Work on TCP and buffer manage- 
ment 

The seminal work by [Bj established the basis of TCP, and [T5] added several key 
features and brought the TCP protocol very close to its current form. TCP is a 
completely distributed algorithm run by the end hosts, and it aims to share the 
resources of the network among all flows in an efficient and fair way. In other 
words, assuming that the network is responsible for the end-to-end transmission 
of packets, TCP tries to determine the fair share of the connections. The latter 
is characterized by the congestion window, denoted by cwnd, which captures the 
number of packets a connection can have simultaneously in the network at any 
given time. The basic principles according to which TCP works are extremely 
simple. The key idea is the dynamic window sizing scheme proposed by |15j . 
Consider that any packet has an identifier that allows the receiver to identify it 
uniquely. The sender sends packets to the destination, and the destination sends 
back to the sender small packets (known as acknowledgements, or simply ACKs) 
acknowledging the correct reception of each packet. This simple mechanism 
allows the sender to infer whether its packets are reaching the destination, as 
well as implicitly to infer as well the congestion level of the network. While 
packets reach the destination, the sender is allowed to increase the amount of 
packets per unit of time it injects into the network. Roughly speaking, TCP 
increases by one the value of cvund in the absence of packet losses. On the 
other hand, when a packet is lost, the receiver will notice it, and it will send a 
special acknowledgment back to the source (known as duplicate ACK, or simply 
dupACK). Upon the reception of three dupACKs, the source will infer that the 
network is congested and it will take two decisions: first, the lost packet will be 
retransmitted and second, the packet sending rate will be reduced by a factor 
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1/7- In the last two decades there has been a huge effort in order to improve 
and develop new congestion control algorithms, and we briefly mentioned some 
of them in the introduction. An extensive overview and comparison of different 
TCP version is given in |20j . 

Buffer management algorithms determine how packets are dropped, or more 
generally how congestion signals are generated. Packets can be dropped either 
when the router buffer is full, which is referred to as DropTail, or when AQM 
(Active Queue Management) scheme is employed ([9]). At the present state of 
the Internet, nearly all congestion signals are generated by packet losses. An 
alternative to packet losses is ECN (Explicit Congestion Notifications) (|25j). 
which marks packets but does not drop them. Despite the tremendous research 
effort it seems that given the ambiguity in the choice of parameters, in reality 
AQM schemes are rarely used in practice. On the other hand, in the basic Drop 
Tail routers, the buffer size is the only one parameter to tune apart from the 
router capacity. We refer the interested reader to [3H] and references therein 
for more information on the problem of optimal choice of buffer size. The main 
drawbacks of the basic Drop Tail mechanism is the synchronization of TCP 
flows and RTT unfairness. TCP connections with shorter RTT achieve faster 
high transmission rates than TCP connections with larger RTT. Our solution 
improves fairness for users with different RTT comparing with DropTail policy. 

Our approach to AQM, based on sound theoretical background of Markov 
Decision Processes, does not require tuning of many parameters and allows one 
to find the following interesting points: higher network utilization and more fair 
data transmission of users with different TCP models. 

3 Problem description 

We describe in this section the congestion control problem of multiple flows at a 
bottleneck router. Suppose there are K flows trying to deliver packets to their 
destinations via a bottleneck router with the following parameters: 

• C the bandwidth, i.e., the deterministic link capacity (in packets per sec- 
ond); 

• B the buffer size (in packets); 

We suppose that the router is capable to correctly classify each arriving packet 
with the corresponding flow. 

Suppose further that flow k G K. := {1, 2, . . . , K} has implemented an 
additive-increase/multiplicative-decrease (AIMD) mechanism as in the Trans- 
mission Control Protocol (TCP). The congestion window cwnd is adapted ac- 
cording to received acknowledgements: for each received non-duplicate acknowl- 
edgment (positive acknowledgement) , cwnd is increased by the reciprocal of the 
current value of cwnd (which approximately corresponds to an increase by one 
packet during a round-trip-time RTT without lost packets) unless it has reached 
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the maximum advertised congestion window allowed Nk 

1 



cwnd := min 



cwnd 



cwnd 



,N k y, (i) 



for each triple duplicate acknowledgment (negative acknowledgment), cwnd is 
decreased multiplicatively using the formula 

cwnd := max{L7fc • congestion window] , 1} , (2) 

where < 7& < 1 is the multiplicative decrease factor. The function |_-J denotes 
the floor function. We consider that independently of the number of dropped 
packets, the congestion window is only decreased once per RTT. The flow starts 
in the (deterministic) initial congestion window value n/~ and it always has 
packets willing to deliver. For transparency we omit the slow-start phase of 
TCP. 

The objective for the router is to use the resources efficiently and provide 
satisfactory user experience so that 

• the overall number of delivered packets per long time intervals (time- 
average throughput) is as large as possible and 

• the flows are treated fairly by having their congestion windows (number 
of packets in the network) as equal as possible and 

• the utilization of the bottleneck queue is as high as possible 



4 Markov Decision Processes Model 

In this section we present a general mathematical formulation of the congestion 
control problem using a Markov Decision Process (MDP) framework of resource 
allocation ([12]). which extends the restless bandit model ([31]). To the best of 
our knowledge restless bandit model formulation has not been used previously in 
congestion control, an area in which an important body of literature is devoted 
to deterministic fluid models (|26j). 

In order to make the model tractable and to be able to design in a simple, 
implementable solution, we will consider several simplifications of the features of 
the problem. Nevertheless, the performance of the solution is then evaluated in 
Section [7] in the original setting of the problem without simplifications described 
above. 

Let us consider the time slotted into discrete time epochs t G T := {0, 1, 2, . . . }, 
which correspond to time periods of one round-trip time (RTT), which is as- 
sumed equal for all the flows. However, in Section [7] this constrain disappears 
and we obtain nice results too. So, we consider users with equal RTT in the 
theoretical part of this paper, although we can conclude from the simulations 
this constrain can be omitted. 

We assume that all packets are of the same size, which we further define to 
be one bandwidth capacity unit. The router takes decisions about admitting or 
rejecting the flows at every time epoch t. 
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Figure 1: A scheme of K flows sharing a bottleneck router. 



4.1 AIMD Flows 

Every flow can be allocated either the capacity required by its current congestion 
window (being admitted) and transmitted, or zero capacity (being rejected). We 
denote by A := {0, 1} the action space, where corresponds to blocking and 1 
corresponds to admitting. This action space is the same for every flow k. 
Each flow k is defined independently of other flows as the tuple 

where 

• Afk := {1,2, . . . , iVfc} is the state space, i.e., a set of possible congestion 
windows flow k can set; 

• Wt := [Wu ) , where Wd n is the expected one-period capacity 

V ' > neJV* 

consumption (in number of packets), or work required by flow k at state n 
if action a is decided at the beginning of a period; in particular, WS n ■= 
and WjJ n :=n; 

• R k '■= \Ri n ] ! where i?£ is the expected one-period generalized 
a-fairness or reward earned by flow k at state n if action a is decided at 
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Figure 2: A model of an AIMD flow as a Markov chain. The arrows repre- 
sent one-period transitions among the states 1,2, ... ,N after a congestion- free 
(ACK) and a congestion-experienced (NACK) transmission. 



the beginning of a period; in particular i?" := and 



(1 + n) 1 



- 1 



1 -a 
log(l + n), 



if a ^ 1, 
if a = 1; 



• .P^ := [Pk n m ) is the flow-A; stationary one-period state-transition 

probability matrix if action a is decided at the beginning of a period, 
i.e., Pk n m is the probability of moving to state m from state n under 
action a; in particular, p\ m = 1 iff m = min{n + 1, TV^} (representing 
additive increase) and p° n = 1 iff m = max{|_7fc • nj ,1} (representing 
multiplicative decrease); the remaining probabilities are zero. 

The dynamics of flow k is thus captured by the state process A^(-) and the 
action process ak(-), which correspond to state Xf.(t) G A4 and action ak(t) G A, 
respectively, at all time epochs t G T. The states n G A4 denote possible levels 
of the sending rate. In particular 14^ cnt := n can therefore be interpreted as the 
bandwidth capacity the flow requires for complete transmission at the current 
period. 

The schematic behavior of the AIMD flow as a Markov chain is shown in 
[2j where "ACK" represents a congestion-free delivery of the flow packets to 
the receiver (positive acknowledgments) and "NACK" represents a congestion- 
experienced transmission (negative acknowledgments). Notice that the evolu- 
tion is completely deterministic; this can be extended to stochastic evolution 
like for instance in [Hj . 



4.2 Multi-flow Optimization Problem 

The flows dynamics is as follows (see[T]). At epoch t, the sender of each flow 
k G JC sets its state X^(t) (that depends on whether the previous-epoch work- 
load was transmitted, given by the positive or negative acknowledgement of 
the receiver of flow k sent back to the sender in the previous period) and 
sends the workload of (t) packets to the bottleneck router. However, 



only < WZ h £',« < Wl ent 



^k k x k \t) — Wj*. e x k (t) packets are allowed to queue in the buffer (ad- 
mitted) for being transmitted. The transmitted packets arrive to the receiver 
of flow k, who obtains the fairness (reward) R^ k ^K t y If the router admitted 
the flow, then the receiver sends positive acknowledgement back to the sender; 
otherwise negative acknowledgment is sent. According to this, the sender sets 
its next-epoch state Xf.{t + 1) and repeats the process. 

The congestion avoidance decision at the router is taken in the following 
way. At epoch t the router observes the states (congestion windows) Xk{t) of 
all flows k € /C. Based on that it decides the flow actions dk(t) (which may be 
viewed to be taken in virtual gates, as illustrated in[T]), instantaneously appends 
(in FIFO order) W^"^* ^ packets of each flow k to the buffer, and transmits (in 
FIFO order) W packets (or all the packets if there are less than W packets in 
the buffer) during the period. 

To summarize, the senders make no decisions and therefore the flows dy- 
namics can be modeled as a Markov chain. 

At this moment we present a generic formulation of the congestion control 
optimization problem. Let II be the set of all history-dependent randomized 
policies. Denote by the symbol E^ the conditional expectation given that the 
initial conditions are n := (rifc) fceJC , and the policy applied is ir £ II. The router 
controller's problem to solve is under the discounted criterion (if j3 < 1) 



maxE! 



subject to E^ 



oo 

E E PKt 



t=o keic 



E K 



(*) 

a k (t) 
X k (t) 



< 



W 



1-/3 
< B, for all t E T. 



(3) 

(4) 
(5) 



keic 



The problem can also be formulated under the time-average criterion. It is 
known by [531 Lemma 7.1.8] that the optimal policy for the time-average crite- 
rion can be obtained by the limit (3 — > 1 of the optimal policy for the discounted 
problem. 

The virtual capacity seen as the target time-average router throughput W is 
equal to the bandwidth-delay-product C x RTT, that is, the maximum number 
of packets that can be served in one slot. We note that an analogous constraint 
Q formulation was used in ,21]. To avoid the trivial problem of underloaded 
router, we assume that 



EE 

.t=o fce/c 



P tw kM k( t) 



> 



w 



9 



5 Decomposition of the Multiple-Flows Prob- 
lem 

The problem (|3])-([5| is difBcult to solve due to the sample path constraint 
One possibility for relaxing the problem is to assume that the buffer space B is 
infinite, so that the constraint ^ is trivially fulfilled. Another possibility is to 
relax that constraint as did |31) , by requiring it only in discounted expectation, 
i.e., 



EE 

,t=o fce/c 



P VV k,X k {t) 



< 



B 



However, such a constraint is weaker than Q, because B > W, since in our 
model B is the total number of packets that the buffer can handle in one RTT, 
whereas W corresponds to the number of packets that can be served in one 
RTT. 

Either of these two relaxation possibilities results in omitting the constraint 
Thus, we end up with a problem formulation (|3])-(|4|), which is analogous 
to the Whittle relaxation of the multi-armed restless bandit problem (|31j). 

The standard solution of such a formulation is by solving for each v the 
Lagrangian relaxation of ([3])-(|4]), which is 



max El 
wen n 



OO 

EE^( 



R 



a k (t) 
k,X k (t) 



Ofc(t) 



k,X k (t) 



w 



(6) 



where v is the Lagrangian parameter that can be interpreted as a per-packet 
transmission cost. The Lagrangian theory assures that there exists v* , for which 
the Lagrangian relaxation ([6| achieves optimum of (|3|)-(|4|). Since for any fixed 
v the flows are independent and the second term of ([6]) is constant, we can 
decompose ^ into K individual-flow problems. 

Proposition 1. Let life be the set of all history- dependent randomized policies 
for flow k, and individual-flow policies ir^ £ LTfc such that they form the joint 
policy 7r* G II. If for a given parameter v, each policy 7rj* for k € K optimizes 
the individual-flow problem 



max E 7 L k 



OO 

E^ 4 (K 

t=0 



x k (t) 



uW, 



a k {t) 



k,X k (t) 



(7) 



then 7r* optimizes the multi-flow problem Q. 

In [6] we will find an optimal solution to such a j/-parameter problem in 
terms of flow- and state-dependent Whittle indices Vk.m which in our setting 
can be interpreted as transmission indices. If the optimal transmission cost v* 
is known, then these indices define the following optimal policy for problem ([3|- 
Q: "At each time slot admit all the flows of actual-state transmission index 
greater than the transmission cost v* and reject the remaining flows" . 
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Since in practice v* is typically unknown, the buffer space is finite, and it is 
desirable to have work-conserving transmission in order to increase bandwidth 
utilization, we use the transmission indices to define practically feasible policy 
for problem ([3j)-<[5]): 

Heuristic Policy: In every slot order the flows in decreasing order with 
respect to their current indices, and admit the flows until reaching the con- 
straint j5|. 

Following the results in the literature, we expect that our heuristic policy 
will be asymptotically optimal as the number of flows and buffer capacity grow 
to infinity ([TO]). 

In Section[7]we will develop a heuristic admission control policy for a TCP /IP 
network at packet level and study its performance with NS-3 simulations. In 
this case the index of each packet represents the admission priority into the 
buffer. If the buffer is full, the packet in the queue with smallest transmission 
index will be dropped. 

6 Index Policies for Single-Flow Subproblems 

The single flow admission control subproblem can be optimally solved by means 
of index policies under some index-existence conditions (|23|). In a particular 
case, the optimal policy is of threshold type (if it is optimal to reject the flow 
under certain congestion window, then it is optimal to do the same also under 
any higher congestion window). However, depending on the parameters, thresh- 
old policies may not be optimal, as will be illustrated in |6.2[ Therefore, in the 
rest of the paper we will obtain index policies only numerically, and the inves- 
tigation whether index policies can be characterized by index values in closed 
form is left out of this paper. (This is indeed possible in some cases, see, e.g. 

muni). 

Since no decisions are taken by the sender, congestion control is implemented 
in the router. The router decides whether the incoming flow in state should 
be admitted (and transmitted) (achieved by action dfe(t) = 1 of transmitting 
W£. := W^ nt packets), or rejected (action Ofc(i) = of transmitting packets). 
The difference of receiver rewards and transmission costs (i.e., R® — uWS and 

\ ' '*k "■k 

R\ k — vW^ k ) will be henceforth called the net reward under transmission cost 
v. 

To summarize, the (unconstrained) MDP problem of the single AIMD flow 
addressed in this section under both the discounted and time-average criteria is 
defined as follows: 

• State space is A/fc. 

• Actions: admitting rejecting are available in each state. 

• Dynamics if admitting: If the flow is in state n,t and the flow is admitted at 
a given period, then during that period it generates net reward R^ —vW„ 
and the flow moves to state + 1 (or remains in Nk, if Jifc — ^Vfe) f° r the 
next period. 
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• Dynamics if rejecting: If the flow is in state and the flow is rejected at a 
given period, then during that period it generates net reward R^ k — 
and the flow moves to state — 1 (or remains in nk if nk = 1) for the 
next period. 

To evaluate a policy n under the /3-discounted criterion, we consider the 



following two measures. Let 



IE- 



be the expected total 



Lt=0 

f3- discounted bandwidth utilization if starting from state i under policy 7T. For 
convenience, we will also call Wf the expected total /3-discounted work, since 
the bandwidth utilization can be seen as the work performed by the router in 



order to transmit the flow. Analogously we denote by 



p*R a W 



X(t) 



,t=0 



the expected total (3-discounted reward if starting from state i under policy 7r. 
The objective Q is for each transmission cost v, 



max J 
Ti-en 



(8) 



We will address this problem in the following subsections, noting that an 
optimal solution for the time-average variant is obtained in the limit /3 — > 1. 



6.1 Threshold Policies and Indexability 

Since for finite-state MDPs there exists an optimal stationary policy indepen- 
dent of the initial state ([21]), we narrow our focus only to those policies and 
represent them via admission sets S C J\f . In other words, a policy S prescribes 
to admit the flow in states in S and to reject the flow in states in := JV\ S. 

We will therefore write Mf and Wf for the expected total /3-discounted re- 
ward and work, respectively, under policy S starting from initial state i. Then, 
formulated for initial state i, the optimization problem is the following combi- 
natorial problem 

maxRf-z/Wf. (9) 

SCAT 11 w 

In order to solve this problem, we will be interested in two related structural 
properties, which are optimality of threshold policies and indexability, formally 
defined below. 

Definition 1 (Optimality of Threshold Policies). We say that problem Q is 
optimally solvable by threshold policies, if for every real-valued v there exists 
threshold state n € A4 U {0} such that threshold policy admitting the flow in 
states :— {m G A/fe : m < n} and rejecting otherwise is optimal for problem 




Of our interest will be the index proposed in [31j . which often furnishes a 
nearly-optimal solution, and typically recovers the optimal index rule if such 
exists. We adopt the definition of indexability from |13) . 
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Definition 2 (Indexability). We say that v -parameter problem is indexable, 
if there exist unique values — oo < Vk,n < oo for all n £ Af k such that the 
following holds for every state n £ Afk : 

1- if Vh,n > v, then it is optimal to admit flow k in state n, and 

Z. if Vk,n < v, then it is optimal to reject flow k in state n. 

The function n i— > Vk.n is called the (Whittle) index, and Vk.n 's are called the 
(Whittle) index values. 

An immediate consequence of the two definitions is formulated in the fol- 
lowing previously known result. 

Proposition 2. If problem ^ is indexable and the index is nonincreasing, i.e., 
Vk.i > > •• > v k,N k , then problem ^ is optimally solvable by threshold 
policies. Moreover, for a given v the optimal threshold policy is S r ^ with n* € 
■A/fe U {0} such that Vk,n' >v> Vk,n*+i (defining Vk,o '■— — oo, Vk,N k +i '■= oo ). 

For transparency, we limit ourselves in this paper to numerical testing of 
indexability and computation of the index values, which then allow to conclude 
about optimality of threshold policies. We have employed an algorithm based 
on the restless bandit framework f[23|), which both numerically checks the con- 
ditions of existence and calculates the index values, if they exist. It is a one-run 
algorithm (analogous to parametric simplex method) and in each step it cal- 
culates one of the index values. Thus, it performs steps, and the overall 
computacional complexity of the algorithm is 0(N k i ). 

We have performed testing of indexability over a large number of flows with 
different parameters. The algorithm always confirmed that the flow was index- 
able. These tests give us evidence to conjecture that the flows as defined in 
this paper are always indexable. However, the complexity of the problem im- 
peded us to find a structure that could be exploited for establishing indexability 
analytically. We will illustrate the difficulty in the next subsections. 

6.2 One-, Two-, and Three-State Flows 

In this subsection we provide analytical results showing that optimality of 
threshold policies depends on values of parameters a, (3, Nk, and 7fe. 

It is obvious ( see, e.g., [2]) that a one-state flow (i.e., Nk = 1) is indexable 
and solvable by threshold policies. The index value for the unique state is 
i>k,i = Rk,i/Wk.i- Similarly, [TJ] proved that a two-state flow (i.e., Nk = 2) 
is indexable and solvable by threshold policies. The index values for the two 
states are 



Vk.l = 



Rk,l 



Vk,2 = 



Rk,2 + P{Rk,2 — Rk,l) 
Wk,2+P(W k ,2-W k ,l)' 
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Finally, [T3] proved that a three-state flow (i.e., N k = 3) is indexable and 
solvable by threshold policies, if 




Pi 



which holds if and only if 7^ < 2/3. Under such a condition, the index values 
for the three states are 

Rk,i 

Vh,X 



Vk,2 



Rk,2 + P[Rk,2 — Rk.l) 



Rk,3 + fl(Rk,3 ~ Rk,l) + /3 2 (Rk,3 — Rk,2) 

" k ' 3 ~ W fe ,3 + /3(W ki3 - W K1 ) + /3 2 (^, 3 - W k , 2 ) ' 

In all the above cases, the threshold policies are optimal, because the index val- 
ues are nonincreasing (due to concavity of the generalized a-fairness function). 

It is tedious but straightforward to show (by a detailed inspection of the 
algorithm that computes the indices) that j k > 2/3 if and only if the index 
values for the three states are as follows. If a < 1, then 

Rk,l 

Vk,\ 



Rk,2 — fiRk.l 

" ka ~ W k . 2 - PW K1 ' 

Rk,3 + P(Rk,3 — Rk,2) 

In this case, the threshold policies are optimal, because the index values are 
nonincreasing. On the other hand, if a > 1, then for some values of ft the index 
values are as above, but for other values of /3 the index values are 

RkS 

Rk,2 + fi{Rk,3 — Rk,l) + P 2 {Rk3 ~ Rkq) 
W k , 2 + /3{W ki3 - W k> i) + fi 2 {W k ^ - W k>2 ) ' 

Rk,3 — P 2 Rk,l 

In this case, we have u kt i > v k $ > u ki 2, therefore threshold policies are not 

(2) 

optimal in general. In particular, threshold policy S]y is never optimal. 
6.3 General Flows 

In this subsection we illustrate using numerical results that the structure of 
optimal policies and the index values depend heavily on values of parameters 



Vk,\ 
Vk,2 
Vk,3 
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(a) For different values of a (b) For different values of j3 




i 1 1 1 1 1 1 004 1 1 1 1 1 1 1 1 1 1 

5 10 15 20 25 30 10 11 12 13 14 15 16 17 18 19 20 

(c) For different values of 7j. (d) For different values of 74, (zoomed in) 

Figure 3: Index values as a function of congestion window for different values 
of parameters. 



a, f3, Nk, and 7^. This feature brings complexity to the mathematical analysis 
of the problem, but on the other hand, it shows that indices nicely capture the 
nature of different TCP variants under different optimization criteria. Never- 
theless, the figures provide insights for better understanding of what an optimal 
admission control policy is. 

As for the basic instance, we set a := 1, /3 := 0.9999, 7^ := 1/2 and Nk ■= 70. 
In [3] we present index values as a function of congestion window for (a) different 
values of a, (b) different values of /?, and (c)-(d) different values of 7^. We note 
that we have observed that different values of Nk may also influence the index 
values, especially if both f3 and 7& are large, but the differences are often not 
noticeable in the figures, so they are omitted. 

First of all, it can be seen in [§a)-(b) that as a or (3 diminish, threshold poli- 
cies become optimal, due to having the index values non-increasing, as opposed 
to the non-monotone (zig-zag) index when both a or /3 are large. A similar 
behavior can be observed also for other values of 7^., which are not reported in 
this paper. [3](a) further indicates that index values are decreasing in a and their 
slopes for a given a are "more convex" , so higher a ensures a stronger discour- 
agement of dropping packets belonging to flows with smaller actual congestion 
window. Note that case a — gives all the index values equal to 1, so no flow 
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is prioritized over other ones, which is analogous to the DropTail policy. [3jb) 
indicates that index values are decreasing in /3, so longer flows get lower priority 
for transmission over shorter flows with the same actual congestion window. 

From[gc)-(d) we can learn interesting insights as well. The index functions 
for 7fc = 0, 0.99 are smooth and non-increasing, as well as, rather surprisingly, 
7/t = 1/3. The remaining cases result in "tooth-shaped" index functions, with 
remarkably different tooth widths (akin to periods) of two (jk = 1/2), three 
(7^ = 2/3), and six (7ft = 1/6, 5/6 look similar, with their period point shifted), 
which may happen because 7fc's are multiples of six in this figure. In spite of 
that, index values of different 7fc's coincide or come very close to each other 
at congestion windows that are multiples of these periods. At multiples of six 
(e.g., 6, 12, . . . ), the index values are increasing in 7^ (at least over the range 
1/3,1/2,2/3,5/6), while at the subsequent points (e.g., 7, 13, ...) the index 
values are decreasing in 7^ , so the priority ordering of flows with such congestion 
windows is completely reversed. 

We underline that[3jc)-(d) suggests that the index function for 7^ = could 
be a reasonable smooth approximation for the index functions for the remaining 
values, especially for 7^ = 1/3. This is of special interest from practical point 
of view, because the index function for 7/- = is known in closed-form due to 
P3]. On the other hand, the same figure also suggests that the index function 
for 7^ = 0.99 (which is essentially a birth-death dynamics) is a lower bound 
for the index functions for the remaining values of 7fc, which is also known in 
closed-form due to [5] . However, 7^ = does not necessarily provide an upper 
bound. Such an upper bound could be obtained by using the myopic index by 
/3 = 0. Note also that as j3 gets closer to 0, numerical differences between index 
values for different 7k's become smaller. 



7 Simulations results 

In this section we present experimental results from implementing index policy 
in simulations. We assume that TCP sources include in the header of each packet 
the index value corresponding to the actual cwnd. Based on the mathematical 
results we define the following heuristic index policy to be implemented in the 
Internet routers: 

Heuristic index policy at packet level: Upon a packet arrival, if the buffer is 
not full, then accept the paper. Otherwise, drop the packet (either the new one 
or from the queue) with smallest index value. In case of ties, drop the packet 
that has been the longest in the queue. 

We have employed and modified TCP New Reno in the NS-3 simulator ([I]) 
to obtain the results in several scenarios. The main objective of this section 
is to show implementability of the proposed index policy, to give fundamental 
insights about its effect, and to evaluate possible gains of these results with the 
DropTail and RED policies. As the measure of fairness we employ the Jain's 
fairness index, whose value ranges from 1 (perfect fairness) to 1/K in a K-user 
system ([16]). 
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Scenario 


Policy 


Utilization 




User fairness 


Mean Queue Size 


RTT 





DropTail 


97.14% 






0.999999 


6.98 


61.4 





RED 


97.63% 






0.982572 


6.11 


58.77 





Index 


98.22% (H 


-1.1%, 


+0.6%) 


0.999798 (-0.1%, +1.8%) 


8.07 


64.7 (+5.2%, +10.2%) 


1 


DropTail 


81.16% 






1.000000 


2.41 


47.4 


1 


RED 


81.71% 






0.998233 


2.54 


47.8 


1 


Index 


88.45% (H 


-7.3%, 


+6.7%) 


0.999837 (-0.1%, +0.1%) 


2.89 


48.8 (+2.3%, +2%) 


2 


DropTail 


90.84% 






0.786535 


7.1 


61.8 


2 


RED 


92.74% 






0.911862 


6.04 


58.5 


2 


Index 


95.37% (H 


-4.5%, 


+2.6%) 


0.962784 (+22.3%, +5.5%) 


7.34 


62.5 (+1.1%, 6.8%) 


3 


DropTail 


86.36% 






0.739875 


6.63 


60.3 


3 


RED 


94.08% 






0.821966 


6.31 


59.4 


3 


Index 


96.86% (H 


-10.5% 


, +2.8%) 


0.917895 (+24.5%, +11.6%) 


6.84 


62.5 (+3.6%, +5.2%) 


4 


DropTail 


93.84% 






0.765318 


6.11 


Scc[7~4] 


4 


RED 


91.6% 






0.89428 


5.81 


4 


Index 


94.97% (+1.1%, +3.3%) 


0.929756 (+21.4%, +3.9%) 


6.37 





Table 1: Utilization of the bottleneck queue, the Jain's fairness index between 
users (over the 20 seconds interval), the Mean Queue Size (in number of packets) 
of the bottleneck buffer and the RTT of a connection of one user (in ms) in the 
simulations for different scenarios. In parentheses, the improvement of index 
policy with respect to DropTail and RED. 

We focus on case a = 1, since it was shown that the loss networks (like the 
current Internet with DropTail) maximizes the aggregate sum of logarithmic 
utilities of the time-average transmission rates ([IB])- Note that our approach 
(with a = 1) maximizes the aggregate sum of the time-average logarithmic 
utilities of the immediate transmission rates. 

The access links of each user is 5Mb/s and the delay is 10ms. The delay of 
the bottleneck link in this scenario is 10ms and the bandwidth capacity of the 
bottleneck link is 1500kb/s. The packet size is 576 Bytes. We set /3 = 0.9999 
which approximates the time-average criterion. We set the maximum value of 
the congestion window of each of the users to Nk = 70, required to compute the 
index values. 

For each of the five scenarios below, we plot the time evolution under the 
DropTail, RED and index policy of the following elements: the size of the queue 
in the router buffer, the congestion window of each of the users, and (only for 
index policy) the index value of each of the users. 

In all figures of the simulation section, user 1 is depicted with a black and 
thin line, and user 2 with a black and thick solid line. 

The results are summarized in[T] 

7.1 Scenario (Baseline): Two Symmetric Users 

As a baseline scenario, we consider two symmetric (equal) users that are sending 
data to a server through a bottleneck router. Each user k = 1,2 is halving her 
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(a) Congestion Window of the two users (b) Congestion Window of the two 

(up) and users (up) and 

buffer queue size (bottom) in DropTail buffer queue size (bottom) in RED 

router router 





(c) Index values of the two users (up), con- 
gestion windows of the two users (middle) 
and buffer queue size (bottom) in index- 
policy router 

Figure 4: Scenario (Baseline): Simulation of a bottleneck with two equal users 
with standard TCP (71,72 = 0.5) 



congestion window, i.e., jk = 1/2. 

The buffer size is set to the Bandwidth-Delay product of a single user, 



1500 • 10 3 b/s • 2 • (10" 2 s + 10" 2 s) 
576B • 8b/B 



13. 



We present the evolution of the congestion window and the size of the queue 



of the router in time for the DropTail case in 4a As expected due to the 
well-known phenomenon, the two users are completely synchronized. 
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We depict the evolution of the congestion window and the size of the queue 



of RED policy in [4bJ We observe that the users get unsynchronized and in this 
instance a periodic behavior of the congestion window is not achieved because 
the packets are dropped randomly. 

We show the evolution of the indices, the congestion window and the size 
of the queue of the router in time for the index policy in [4cj Interestingly, we 
can observe that users become ideally unsynchronized and as a consequence 
the utilization of the buffer increases. As it can be seen from[l] the throughput 
increases by 1.1% and 0.6% with respect to Droptail and RED policies. However, 
the user fairness remains essentially the same as in DropTail policy, but it 
improves user fairness comparing with RED. 

7.2 Scenario 1: Reducing Buffer Size 

In this scenario we analyze the influence of setting a smaller buffer size in the 
bottleneck queue for index policies, DropTail and RED. To investigate this 
effect, we set the buffer size of the router to 6. 

As we can see in |5a| with DropTail users are synchronized and the buffer is 
empty more time than in the previous scenario. 

In |5b[ we observe again that the congestion window with RED does not 
change pcriodicly. 



In the case of the index policy (see 5c) the users are desynchronized, and 



as a result the number of delivered packets is larger than DropTail policy and 
also higher than RED. We note that the fairness among users does not change 
much in this case. We can also observe that the utilization increases by 7.3% 
and 6.7% with respect to DropTail and RED, but the RTT increases only 2.3% 
and 2%, respectively. 

The main conclusion of this scenario is that a buffer size smaller than the 
bandwidth-delay product, the throughput of index policies is larger than Drop- 
Tail and RED. 

7.3 Changing Multiplicative Decrease Factor 

We illustrate the inter-protocol properties of the index policy. In Scenario 2 we 
change the multiplicative decrease factor of user 1 to 71 = 0, and in Scenario 3 
to 71 = 0.9. 

7.3.1 Scenario 2: 

In this setting user 1 is conservative comparing to user 2, and reinitializes the 
congestion window to 1 every time a packet is lost, i.e, 71 = 0. 



We observe in 6a that with DropTail the congestion window of user 2 is 
consistently bigger. This implies that the number of delivered packets by user 
2 is much larger, as can be seen in [I] 
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(a) Congestion Window of the two users 

(up) and (b) Congestion Window of the two users 

buffer queue size (bottom) in DropTail (up) and 

router buffer queue size (bottom) in RED router 





(c) index values of the two users (up), con- 
gestion windows of the two users (middle) 
and buffer queue size (bottom) in index- 
policy router 

Figure 5: Scenario 1: Simulation of a bottleneck link with two equal users with 
standard TCP and a buffer size smaller than the bandwith-delay product 



In 6b 



we see that once again RED causes an unsynchronized behavior of 
the users. Besides, RED policy improves fairness among users with respect to 
DropTail, as we show in[l] 

With the index policy the congestion window of user 1 is reduced less often 
than user 2 (see 6c). In this case, users are completely desynchronized with the 
property of higher Jain's fairness value, which improves with respect to DropTail 
and RED by 22.3% and 5.5%, respectively. At the same time, the utilization of 
index policy is larger, so that the total number of delivered packets is increased 
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(a) Congestion Window of the two users 

(up) and (b) Congestion Window of the two users 

buffer queue size (bottom) in DropTail ( U p) and 

router buffer queue size (bottom) in RED router 




71 





(c) Index values of the two users (up), con- 
gestion windows of the two users (middle) 
and buffer queue size (bottom) in index- 
policy router 



Figure 6: Scenario 2: Simulation of a bottleneck link with two users, where 
Userl has restarting TCP (71 = 0) and User2 has standard TCP (72 = 0.5) 



by 4.5% and 2.6% comparing with DropTail and RED, while the RTT of one 
user increases by 1.1% and 6.8%, respectively. 

From this scenario, we concude that index policy improves user fairness and 
throughput for users with different TCP models. 
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(a) Congestion Window of the two users 

(up) and Congestion Window of the two users 

buffer queue size (bottom) in DropTail ( U p) anc j 

router buffer queue size (bottom) in RED router 



(c) fndex values of the two users (up), con- 
gestion windows of the two users (middle) 
and buffer queue size (bottom) in index- 
policy router 



Figure 7: Scenario 3: Simulation of a bottleneck link with two users, where 
Userl has a TCP with 71 = 0.9 and User2 has standard TCP (72 = 0.5) 



7.3.2 Scenario 3: 

User 1 is now much more aggressive than user 2, since it barely reduces its 
congestion window in response to a congestion event, i.e. 71 = 0.9. 

With DropTail we observe that user 1 has a significantly bigger cwnd in 
every moment (see 7a I . This illustrates that DropTail is not fair in this setting. 

In [7b] we see that RED policy is not fair either, because the congestion 
window of the user 1 is always higher than the congestion window of user 2. 
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On the other hand, we see in 7c that with the index policy the difference 



in the congestion window between both users is not big, which results in a 
significantly better Jain's fairness index value, which is increased by 24.5% and 
11.6% comparing with DropTail and RED, respectively. At the same time, the 
utilization is increased by 10.5% and 2.8% with respect to DropTail and RED 
and the RTT of one user increases by 3.6% and 5.2%. 

The main contribution of this scenario consists on showing that for index 
policy the user fairness is larger than DropTail and RED when a user is more 
agressive than the other. 



7.4 Scenario 4: Changing Propagation Delay 

In this scenario we modify the propagation delay of the access link of user 1 and 
we set it to 50ms. 

In this setting, the congestion window of both users under DropTail, RED 
and the index policy becomes very similar. The main difference is that with the 
index policy, the users gets again "ideally" unsynchronized, while the behavior 
of the congestion window in DropTail and RED is more random. As we can see 
in the Jain's index in[l] the user fairness for index policy is 24.5% and 11.6%, 
larger than DropTail and RED policies, respectively. 

We explain the differences of the RTT of each user in DropTail, RED and 
index policy. The values of the RTT of the user 1 for DropTail, RED and 
index policy are 138.8ms, 137.8ms and 141ms, respectively. For this user, the 
increasing of the RTT in index policy with respect to DropTail and RED is 1.5% 
and 2.3%. On the oher hand, the values of the round-trip-time of the user 2 for 
Droptail, RED and index policy are 58.8ms , 57.8ms and 61ms respectively, so 
the increase of the RTT of this user with index policy with respect to DropTail 
and RED is 3.7% and 5.5%. 

According to the results obtained in this scenario, we conclude that index 
policy has the property of RTT fairness and higher throughput than DropTail 
and RED policies. 



8 Conclusion 

In this paper we have introduced a rigorous framework of Markov Decision 
Processes for the problem of optimal queue management in a bottleneck router 
in order to achieve fast and fair transmission. We have focused on showing 
tractability of the model and designed a heuristic by means of transmission 
indice s that can be implemented for packet-level admission to the buffer. 

The main goal has been to study in a benchmark topology the fundamental 
features and performance of the proposed heuristic for flows that behave ac- 
cording to TCP /IP protocol. A noticeable feature is that the proposed heuristic 
manages to desynchronize (and even counter-synchronize) the flows, so that net- 
work resources can be used more efficiently We have shown in NS-3 simulations 
that the proposed heuristic significantly improves over the DropTail and RED 
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(a) Congestion Window of the two users (b) Congestion Window of the two users 
(up) and (up) and 

buffer queue size (bottom) in DropTail buffer queue size (bottom) in RED router 
router 




(c) Index values of the two users (up), con- 
gestion windows of the two users (middle) 
and buffer queue size (bottom) in index- 
policy router 

Figure 8: Scenario 4: Simulation of a bottleneck link with two users, where the 
propagation delay of Userl is 50ms 



policies in several aspects, including fairness across users with different TCP 
variants and fairness with respect to different Round Trip Times. In scenarios 
where DropTail and RED performed well in these types of fairness, the proposed 
heuristic maintained the same levels of fairness, and improved the throughput 
at the same time. We also believe that we provide fundamental framework and 
ideas useful or directly applicable also in next-generation Internet architectures 
such as the Information-centric networking (ICN) and wireless networks. 

We have made several simplistic assumptions in the modeling of TCP. How- 
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ever, we believe that our approach opens an interesting research avenue to design 
admission control policies in Internet routers. In order to fully validate our ap- 
proach, future work must address some of the limitations of our model, namely, 
the efficient computation of index values, impact of misbehaving TCP sources, 
simulations in more realistic scenarios. 
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